80 research outputs found
Bandwidth extension of narrowband speech
Recently, 4G mobile phone systems have been
designed to process wideband speech signals whose
sampling frequency is 16 kHz. However, most part of
mobile and classical phone network, and current 3G
mobile phones, still process narrowband speech signals
whose sampling frequency is 8 kHz. During next future,
all these systems must be living together. Therefore,
sometimes a wideband speech signal (with a bandwidth up
to 7,2 kHz) should be estimated from an available
narrowband one (whose frequency band is 300-3400 Hz).
In this work, different techniques of audio bandwidth
extension have been implemented and evaluated. First, a
simple non-model-based algorithm (interpolation
algorithm) has been implemented. Second, a model-based
algorithm (linear mapping) have been designed and
evaluated in comparison to previous one. Several CMOS
(Comparison Mean Opinion Score) [6] listening tests show
that performance of Linear Mapping algorithm clearly
overcomes the other one. Results of these tests are very
close to those corresponding to original wideband speech
signal.Postprint (published version
Codificación APVQ de voz en banda ancha usando asignación dinámica de bits
This paper describes a coding scheme for broadband speech. It can be seen as a vectorial extension of a conventional ADPCM encoder. In this scheme, signal vector is formed with one sample of the normalized prediction error of each subband and then it is vector quantized. It combines the advantages of the scalar prediction and those of vector quantization (VQ). We handle the high vector dimensionality by using a multiVQ. It requires a previous subvector division and an adequate bit assignment among them. This scheme shows a high capacity to drive large dynamic range signals like broadband speech. Predictor and codebook dessigns are discussed. Some results about speech prediction and coding are reported.Peer ReviewedPostprint (published version
APVQ encoder applied to wideband speech coding
The paper describes a coding scheme for broadband speech (sampling frequency 16 KHz). The authors present a wideband speech encoder called APVQ (adaptive predictive vector quantization). It combines subband coding, vector quantization and adaptive prediction. The speech signal is split into 16 subbands by means of a QMF filter bank and so every subband is 500 Hz wide. This APVQ encoder can be seen as a vectorial extension of a conventional ADPCM encoder. In this scheme, signal vector is formed with one sample of the normalized prediction error signal coming from different subbands and then it is vector quantized. The prediction error signal is normalized by its gain and normalized prediction error signal is the input of the VQ and therefore an adaptive gain-shape VQ is considered. This APVQ encoder combines the advantages of scalar prediction and those of vector quantization. They evaluate wideband speech coding in the range from 1.5 to 2 bits/sample, that leads to a coding rate from 24 to 32 kbps.Peer ReviewedPostprint (published version
Codificación APVQ de voz en banda ancha para velocidades entre 16 y 32 KBPS
This paper describes a coding scheme for broadband speech (sampling frequency 16KHz). We present a wideband speech encoder called APVQ (Adaptive Predictive Vector Quantization). It combines Subband Coding, Vector Quantization and Adaptive Prediction as it is represented in Fig. I. Speech signal is split in 16 subbands by means of a QMF filter bank and so every subband is 500Hz wide. This APVQ encoder can be seen as a vectorial extension of a conventional ADPCM encoder. In this scheme, signal vector is formed with one sample of the normalized prediction error signal coming from different subbands and then it is vector quantized. Prediction error signal is normalized by its gain and normalized prediction error signal is the input of the VQ and therefore an adaptive Gain-Shape VQ is considered. This APVQ Encoder combines the advantages of Scalar Prediction and those of Vector Quantization. We evaluate wideband speech coding in the range from 1 to 2 bits/sample, that leads to a coding rate from 16 to 32 kbps.Peer ReviewedPostprint (published version
Codificación APVQ-extendida de voz de banda ancha
This paper describes a coding scheme for broadband speech. It can be seen as a vectorial extension of an conventional ADPCM encoder. In this scheme, the vector signal is formed with one sample of the normalizaed prediction error of each subband and then, it is vector quantized. It combines the advantages of the scalar prediction and of the vector quantization (VQ) . We handle the high vector dimensionality by using a multi-VQ. It requires a previous subvector division and an adequate bit assignement among them. This scheme shows an high capacity to drive large dynamic range signals like broadband speech.Peer ReviewedPostprint (published version
Third-order cumulant-based wiener filtering algorithm applied to robust speech recognition
In previous works [5], [6], we studied some speech enhancement algorithms based on the iterative Wiener filtering method due to Lim-Oppenheim [2], where the AR spectral estimation of the speech is carried out using a second-order analysis. But in our algorithms we consider an AR estimation by means of cumulant analysis. This work extends some preceding papers due to the authors: a cumulant-based Wiener Filtering (AR3_IF) is applied to Robust Speech Recognition. A low complexity approach of this algorithm is tested in presence of bathroom water noise and its performance is compared to classical Spectral Subtraction method. Some results are presented when training task of the speech recognition system (HTK-MFCC) is executed under clean and noisy conditions. These results show a lower sensitivity to the presence of water noise when applying AR3_IF algorithm inside of a speech recognition task.Peer ReviewedPostprint (published version
Comparison of different order cumulants in a speech enhancement system by adaptive Wiener filtering
The authors study some speech enhancement algorithms based on the iterative Wiener filtering method due to Lim and Oppenheim (1978), where the AR spectral estimation of the speech is carried out using a second-order analysis. But in their algorithms the authors consider an AR estimation by means of a cumulant (third- and fourth-order) analysis. The authors provide a behavior comparison between the cumulant algorithms and the classical autocorrelation one. Some results are presented considering the noise (additive white Gaussian noises) that allows the best improvement and those noises (diesel engine and reactor noise) that leads to the worst one. And exhaustive empirical test shows that cumulant algorithms outperform the original autocorrelation algorithm, specially at low SNR.Peer ReviewedPostprint (published version
Some robust speech enhancement techniques using higher order AR estimation
Peer ReviewedPostprint (published version
Speech enhancement by adaptive wiener filtering based on cumulant ar modelling
Peer ReviewedPostprint (published version
Predicción lineal de la parte causal de la autocorrelación para la identificación del locutor en ambientes ruidosos
Recently, a new parametrization technique based on the AR modelling of the one-sided autocorrelation sequence (OSALPC) has shown to be attractive for speech recognition because of its simplicity and its high recognition perfomance in noisy conditions. In this paper, that new parametrization technique is proposed to speaker identification in noisy enviroment. Experimental results obtained with a new speaker identification system based on the statistics of the cepstrals vectors show that OSALPC also achieves much better results than standard parametrization techniques.Peer ReviewedPostprint (published version
- …